EMPWC: Expectation Maximization with Particle Swarm Optimization based Weig- hted Clustering for Outlier Detection in Large Scale Data
نویسنده
چکیده
Outlier detection is usually considered as a pre-processing step for locating in a data set, those objects that do not conform to well-defi ned notions of expected behaviour. It is very important in data mining for discovering novel or rare events, anomalies, vicious actions, exceptional phenomena etc. However, investigation of outlier detection for categorical data sets is especially a challenging task because of the diffi culty of defi ning a meaningful similarity measure. In addition, one might have to determine the optimal value of outliers, that is, how many outliers a data set really has. A possible theoretical approach to this problem is to search for a range of values of outliers (o) and decide on an optimal value of outlier (o) by optimizing certain variational property. Because of this reason, Particle Swarm Optimization (PSO) method is introduced in this paper to search for a range of values of outliers (o). The proposed work consists of three major steps: (i) defi ne a function for outlier factor (ii) optimization value of outlier and (iii) clustering methods for outlier detection. In the fi rst step of the work, defi ne a new concept of entropy that takes both Shannon and Jensen-Shannon Divergence (JSD) into consideration. The second step, PSO is introduced to search outliers. Here, the PSO includes n number of data samples N which are moving around a D-dimensional search space for optimizing a certain variational property. Based on this PSO, defi ne a function for the outlier factor of an object which is solely determined by the object itself and can be updated effi ciently. Finally, propose EMPWC outlier detection method which requires no user-defi ned parameters for deciding whether an object is an outlier. In addition to this EMPWC based outlier detection methods associate a weight from entropy function with each observed dataset samples. Here, introduce the weighted-data Gaussian mixture and EM algorithms. The fi rst one considers a weight for each categorical data attributes. The second one treats each weight and detects outliers. The experiment results on large scale categorical datasets demonstrate that the proposed EMPWC based outlier detection methods can achieve a better tradeoff between Detection Rate (DR), False Alarm Rate (FAR) when compareing to state-of-the-art outlier detection approaches.
منابع مشابه
A TWO-STAGE DAMAGE DETECTION METHOD FOR LARGE-SCALE STRUCTURES BY KINETIC AND MODAL STRAIN ENERGIES USING HEURISTIC PARTICLE SWARM OPTIMIZATION
In this study, an approach for damage detection of large-scale structures is developed by employing kinetic and modal strain energies and also Heuristic Particle Swarm Optimization (HPSO) algorithm. Kinetic strain energy is employed to determine the location of structural damages. After determining the suspected damage locations, the severity of damages is obtained based on variations of modal ...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملResearch on particle swarm optimization based clustering: A systematic review of literature and techniques
Optimization based pattern discovery has emerged as an important field in knowledge discovery and data mining (KDD), and has been used to enhance the efficiency and accuracy of clustering, classification, association rules and outlier detection. Cluster analysis, which identifies groups of similar data items in large datasets, is one of its recent beneficiaries. The increasing complexity and la...
متن کاملUsing Particle Swarm Optimization and Locally-Tuned General Regression Neural Networks with Optimal Completion for Clustering Incomplete Data Using Finite Mixture Models
In this paper, a new algorithm is presented for unsupervised learning of Finite Mixture Models using incomplete data set. This algorithm applies Particle Swarm Optimization to solve the local optima problem of the Expectation-Maximization algorithm. In addition, the proposed algorithm uses Locally-tuned General Regression neural networks with Optimal Completion Strategy to estimate missing valu...
متن کاملClustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers
In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...
متن کامل